A Systematic Prospective Comparison of Fluid Volume Evaluation across OCT Devices Used in Clinical Practice

Objective Treatment decisions in neovascular age-related macular degeneration (nAMD) are mainly based on subjective evaluation of OCT. The purpose of this cross-sectional study was to provide a comparison of qualitative and quantitative differences between OCT devices in a systematic manner. Design Prospective, cross-sectional study. Subjects One hundred sixty OCT volumes, 40 eyes of 40 patients with nAMD. Methods Patients from clinical practice were imaged with 4 different OCT devices during one visit: (1) Spectralis Heidelberg; (2) Cirrus; (3) Topcon Maestro2; and (4) Topcon Triton. Intraretinal fluid (IRF), subretinal fluid (SRF), and pigment epithelial detachment (PED) were manually annotated in all cubes by trained human experts to establish fluid measurements based on expert-reader annotations. Intraretinal fluid, SRF, and PED volume were quantified in nanoliters (nL). Bland–Altman plots were created to analyze the agreement of measurements in the central 1 and 6 mm. The Friedman test was performed to test for significant differences in the central 1, 3, and 6 mm. Main Outcome Measures Intraretinal fluid, SRF, and PED volume. Results In the central 6 mm, there was a trend toward higher IRF and PED volumes in Spectralis images compared with the other devices and no differences in SRF volume. In the central 1 mm, the standard deviation of the differences ranged from ± 3 nL to ± 6 nL for IRF, from ± 3 nL to ± 4 nL for SRF, and from ± 7 nL to ± 10 nL for PED in all pairwise comparisons. Manually annotated IRF and SRF volumes showed no significant differences in the central 1 mm. Conclusions Fluid volume quantification achieved excellent reliability in all 3 retinal compartments on images obtained from 4 OCT devices, particularly for clinically relevant IRF and SRF values. Although fluid volume quantification is reliable in all 4 OCT devices, switching OCT devices might lead to deviating fluid volume measurements with higher agreement in the central 1 mm compared with the central 6 mm, with highest agreement for SRF volume in the central 1 mm. Understanding device-dependent differences is essential for expanding the interpretation and implementation of pixel-wise fluid volume measurements in clinical practice and in clinical trials. Financial Disclosure(s) Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Neovascular age-related macular degeneration (nAMD) is a chronic disease that is characterized by fluid accumulation in the intraretinal space (intraretinal fluid [IRF]), subretinal space (subretinal fluid [SRF]) and below the retinal pigment epithelium (RPE), referred to as pigment epithelial detachment (PED). 1 Intravitreal anti-VEGF therapy is the gold standard treatment for nAMD, but it requires regular monitoring of disease activity and shows inferior success in patient care in the clinical setting compared with clinical studies.2e5 With a growing elderly population, optimization of age-related macular degeneration management would relieve a health care system overburdened with an estimated global prevalence of 170 million patients with this condition, expected to rise to 288 million by the year 2040. 6,7Furthermore, upcoming therapies in exudative and nonexudative macular disease will increase the burden on hospitals and the demand for automated support systems. 8,9Consequently, reliable, high-quality diagnostic devices and precise biomarker assessment are essential for timely disease detection and personalized treatment decisions.10e12 OCT is the most powerful, noninvasive modality for imaging the retina. 13,14The hardware and software of OCT devices have rapidly evolved since its introduction for ocular axial length measurements in 1988. 15The progress from time-domain to Fourier-domain imaging technology increased the scanning speed and enabled higher B-scan rates, resulting in 3-dimensional volume images. 16This crucial step led to a higher consensus in clinical interpretation and faster detection of disease activity.16e18 Hence, study end points and treatment decisions in clinical practice are routinely based on macular structure analyses on swept-source (SS)-OCT and spectral domain (SD)-OCT devices. 10,13,14,17Swept-source-OCT uses longer center wavelengths for faster acquisition speed and deeper light penetration into the eye and therefore has optimized choroidal visualization with reduced axial resolution compared with SD-OCT. 19Clinical trials and OCT analyses in the current literature frequently encompass SD-OCT manufacturers Zeiss, Heidelberg, and Topcon. 20oncurrently, the combination of the volumetric display of retinal morphology with OCT angiography on SS-OCT systems are broadly applied in clinic and examined throughout the literature. 1,21n recent years, validation of automated algorithms for OCT biomarker quantification has continuously demonstrated that artificial intelligence (AI) is able to extract and quantify information from OCT volumes on a voxel level in a fast and objective manner and performs equally to human experts.22e25 Currently, deep-learning algorithms are being developed on devices from different manufacturers and are most prevalently implemented on SD-OCTs from Zeiss, Heidelberg, and Topcon, 26 while developments on SS-OCTs, such as the Topcon Triton, are also explored. 27The use of different OCT systems for AI development still represents a challenge because algorithms need to be trained and validated based on devicespecific characteristics for optimal performance. 28In the novel era of AI in the retina, personalized nAMD treatment will depend on consistent biomarker quantification throughout this major spectrum of frequently used OCTs. 29owever, human expert annotations are the gold standard for training AI algorithms for biomarker quantification.In this study, fluid volume was quantified in 3 commonly used SD-OCTs, Zeiss Cirrus, Heidelberg Spectralis, and Topcon Maestro2, and one SS-OCT, Topcon Triton.Retinal fluid volumes were compared between these commercially available devices based on human expertise.The comparison of fluid volume measurements throughout commonly used OCT systems is an essential step for expanding the application of AI algorithms in clinical practice.To date, this is the first work that compares human expert annotations of fluid in nAMD between commonly used OCT devices.1.

Image Analysis
Retinal Fluid Volume Evaluation.An AI-based algorithm automatically segmented the fluid compartments with each voxel classified by a multiscale convolutional neural network.In short, this convolutional neural network applies deep-learning to map OCT images to pixel-level class labels based on large amounts of labeled training data.Sematic segmentation allows the neural network to map an input image of a specific size to an image of class labels of the same size.This is based on an encoder that transforms an input image into an abstract representation and a decoder that maps the abstract representation to an image of clinical class labels.Therefore, each pixel is assigned the label IRF, SRF, or PED or healthy tissue. 30Pigment epithelial detachment was segmented based on a previously validated algorithm that segments the region between the RPE and Bruch's membrane (BM). 31The algorithm was trained and validated as described previously. 30,31Manual pixel-wise corrections of the AI-based segmentation of IRF, SRF, and PED were performed by an expert reader (K.K.) trained according to reading center standards to ensure the comparison of these devices based on human expertise and avoid comparison of algorithm performance on the specific device, as deep-learning algorithms are not yet validated for all 4 devices used in this study.Difficult cases were discussed in a group with senior retina specialists (V.M. and G.R.) until consensus was reached.The reader was masked to the segmentations on the other devices and performed all gradings independently for each device.In the manual grading protocol, IRF was defined as distinct hyporeflective regions within the neurosensory retina, including all layers between the internal limiting membrane and the ellipsoid zone.Subretinal fluid was defined as a hyporeflective space between the ellipsoid zone and the RPE.Pigment epithelial detachment was defined as an elevation of the RPE from BM with fibrovascular and/or serous components.The threshold for minimum PED width was set at 300 mm, which is 50 mm narrower than previously defined to avoid identification of borderline PEDs, if present. 1 There was no threshold for PED height.Once a defined PED was marked, annotations of the same lesion were continued in adjacent B-scans regardless of the lesion size.Figure 1 demonstrates examples of manual pixelwise annotations for each device.
Corrections were conducted in the total macular OCT volume consisting of 97 B-scans (3880 B-scans in total) in Spectralis.In Cirrus and Maestro, every second B-scan was annotated, manually correcting 64 B-scans per OCT volume (2560 B-scans/device in total).For Triton, the reader corrected every fourth B-scan, manually marking 64 B-scans per OCT volume (2560 B-scans in total).Fluid volume measurements were only calculated in manually corrected B-scans.B-scans that were not corrected were removed from the measurements because it has been demonstrated previously that there is no significant difference in fluid volume calculations between 64 B-scans and 97 B-scans. 32The IRF, SRF, and PED volumes were computed in the common Early Treatment of Diabetic Retinopathy Study macular grid in the central 1, 3, and 6 mm and analyzed in nanoliters (nL).The IRF, SRF, and PED volumes were summed to determine the total fluid volume (TFV) for each OCT volume.The position of the fovea was set manually in each OCT volume as a reference point for volume comparison.

Statistical Analysis
This is an explorative data analysis.Descriptive statistics were calculated for each retinal fluid compartment in the central 1 and 6 mm.BlandeAltmann plots were created to analyze the limits of measurement agreement and the presence of systematic bias between 2 devices separately for all 3 fluid compartments in the central 1 and 6 mm.For SRF, the agreement of fluid measurements was additionally examined within a 10-nL threshold in the central 1 mm because SRF-tolerating regimes have been discussed in the recent literature. 12,33he Friedman test, a nonparametric test for dependent samples with post hoc pairwise comparisons using Bonferroni correction, was performed to test for significant differences in IRF, SRF, PED, and TFV between all 4 devices in the central 1, 3, and 6 mm.Intraclass correlation coefficients (ICCs) and their 95% confidence intervals (CIs) were calculated based on a mean-rating (k ¼ 4), consistency, 2-way mixed-effects model.The data were analyzed with SPSS statistical software.The alpha error was set to P < 0.05.

Results
A total of 160 OCT volumes from 40 eyes of 40 patients with 11 560 corrected B-scans were analyzed.Twenty-four patients (60%) were female, and 40% were male.The mean patient age was 78.85 AE 7.3 years.Descriptive statistics for IRF, SRF, and PED in the central 1 and 6 mm are summarized in Table 2.

Qualitative Differences in Fluid Volumes
Figure 1 demonstrates the qualitative differences of all 4 OCT devices with pixel-wise human expert annotations.Figure 1B, F shows examples of clearly delineated IRF and SRF borders in all OCT devices with very similar pixel-wise expert reader gradings.Figure 1D, H shows examples with unclear IRF and SRF borders with differences in pixel-wise expert reader annotations, especially for IRF (Fig 1H).The B-scans from different devices vary in their signal-to-noise ratio, axial resolution, reflectivity, and contrast.Spectralis' B-scan averaging and higher signal-to-noise ratio facilitated the recognition of the 360 IRF borders, the ellipsoid zone as the SRF border, and BM as the PED border in challenging cases.

Evaluation in the Wider 6-mm Area
The results from all pairwise comparisons in the central 6 mm for all fluid compartments are summarized in Table 3, including the 95% limits of measurement agreement, standard deviation (SD) of differences, and difference of means (dM).There was a trend toward higher IRF volume measurements in Spectralis compared with Maestro, Cirrus, and Triton, as graphically displayed in Figure 2 for each device comparison separately.The agreement within the limits of measurement agreement was lower in higher IRF volumes in the central 6 mm.The highest dM was calculated between Maestro and Spectralis (e20 nL, SD AE 51 nL), followed by Cirrus and Spectralis (e17 nL, SD AE 33 nL) and Triton and Spectralis (e15 nL, SD AE 40 nL).For SRF, there was no trend or bias in any of the pairwise comparisons (Table 3), whereas PED volume showed high differences in all pairwise comparisons with a trend toward higher volume measurements in Spectralis compared with the 3 other devices (Fig 3).For PED, the highest dM was measured between Triton and Spectralis (e63 nL, SD AE 106 nL), followed by Cirrus and Spectralis (e47 nL, SD AE 107 nL), Maestro and Spectralis (e32 nL, SD AE 97 nL), and the lowest between Cirrus and Maestro (dM e15 nL, SD AE 53).

Evaluation in the Central 1-mm Area
The results from all pairwise comparisons in the central 1 mm for all fluid compartments are summarized in Table 3.For IRF, the SD of the differences was between AE 3 nL and AE 6 nL with dM between 0.2 and 2 nL.For SRF, the SD of the differences were between AE 3 and AE 4 with a dM 1 nL in all pairwise comparisons.Pigment epithelial detachment had the highest SD and dM in the central 1 mm with SD of differences between AE 7 nL and AE 10 nL and dM between 0.7 nL and 3 nL.

Evaluation of Differences between 1-mm and 6mm Areas
The Friedman test showed no significant differences in IRF volume in the central 1, 3, and 6 mm.
The SRF volumes did not differ significantly in the central 1 mm.In the central 6 mm, there were significant differences in SRF volume between Triton and Spectralis (P ¼ 0.026), Triton and Cirrus (P ¼ 0.004), and Triton and Maestro (P ¼ 0.004).In the central 3 mm, SRF volume differed significantly between Triton and Spectralis (P ¼ 0.038) and Triton and Maestro (P ¼ 0.002).

Impact of B-Scan Rate
The influence of B-scan rate on fluid volume was further examined in the Spectralis device.No statistically significant differences in TFV, IRF, SRF, and PED volume were found between 64 B-scans and 97 B-scans in the central 1, 3, and 6 mm.

Discussion
A data set of 160 OCT volumes with 11 560 manually annotated B-scans was analyzed in this cross-sectional study.The goal of this study was to quantify retinal fluid in the frequently used SD-OCT devices and one SS-OCT device in clinical practice to establish whether IRF, SRF, and PED volumes are accurately quantifiable and comparable throughout devices.Understanding the device-specific characteristics facilitates the interpretation of our results.Cirrus and Maestro have similar acquisition speed with the same B-scan spacing and comparable center wavelengths and axial resolution.For Cirrus, pupil position and focus have to be set by the examiner, whereas Maestro performs with a self-sufficient acquisition after the pupil position has manually been identified.The position of the macular cube cannot be moved on the Maestro device, and the quality may suffer by bad fovea centration in noncompliant patients.The only SS-OCT in this study, Triton, scans the retina faster than SD-OCT devices with the highest number of B-scans and allows for a better visualization of the choroid.In Triton, pupil position and focus are manually controlled by the examiner, whereas the position of the macular cube cannot be moved manually, which might lead to worse foveal centration as described previously.Motion artifacts are minimalized by faster acquisition speed.For Spectralis, multiple image settings can be chosen.The examiner controls the position of the macular cube, pupil position, focus, and illumination during the exam.These manual adjustments require experience and expertise and are crucial for good image quality.In the high-resolution mode with B-scan averaging set at 16 frames, motion artifacts are prevented due to B-scan averaging.However, longer acquisition time is strenuous and requires the patient's concentration.In summary, Maestro and Cirrus are comparable devices with regard to fluid volume measurements, quality, and user experience, whereas Spectralis produces B-scans with the highest signal-to-noise ratio with easier fluid delineation.Triton is the most distinctive of the other devices due to the different size of the macular cube (7 Â 7 mm compared with 6 Â 6 on Spectralis, Cirrus, and Maestro), technical fundamentals with the highest B-scan rate (256 B-scans compared with 128 in Cirrus and Maestro2 and 97 B-scans in Spectralis), and acquisition speed.
Analyses of the BlandeAltman plots showed the highest agreement for SRF in the central 1 mm and a potentially clinically significant difference in IRF and PED volumes in the central 6 mm with a trend toward higher measurements in Spectralis compared with the other devices.No significant differences in IRF and SRF volume in the central 1 mm were found, whereas PED volume differed significantly between OCT devices.We demonstrated that fluid is quantifiable and comparable between Spectralis, Cirrus, and 2 Topcon OCTs with excellent reliability with ICC values above 0.94 for all fluid compartments in the central 6 mm.Total fluid volume analysis was used as an additional outcome parameter and demonstrated excellent reliability with significant differences between devices with less or more background noise (Spectralis and Cirrus) and between SS-OCT and SD-OCT (Spectralis and Triton, Maestro and Triton).For clinical outcomes in clinical practice, each fluid compartment influences the morphological and functional outcomes differently. 34Intraretinal fluid and SRF, compartments that trigger treatment decisions with anti-VEGF, showed no significant differences between any of the devices in the clinically significant central 1 mm.A trend toward higher IRF volume measurements in Spectralis in the central 6 mm can be explained by difficulties in clearly distinguishing IRF borders in other devices.Subretinal fluid did not exhibit any bias or trend between any of the 2 devices in the BlandeAltman plots in the central 1 mm.More importantly, SRF had the narrowest 95% limits of agreement with values under 10 nL in the central 1 mm in all pairwise SRF volume comparisons (Table 3).Additionally, 98% to 100% of SRF volume differences in all 6 pairwise comparisons were within a 10-nL threshold in the central 1 mm.This finding is of high relevance, as SRF-tolerating treatment regimens are currently discussed controversially. 12,33Currently, there is no evidence on clinically relevant thresholds for fluid volumes.However, thresholds are being examined throughout the literature on automated fluid quantification, 12,35 similar to threshold for central retinal thickness (CRT) in current treatment regimes. 36The threshold of 10 nL in the central 1 mm cannot be translated yet to clinical practice but allows for a more thorough understanding of the data analyzed in this paper, and therefore, it broadens our understanding on fluid volume quantification.We postulate that differences in SRF volume in the central 6 mm between Triton and Spectralis, Cirrus, and Maestro are due to the very distinctive imaging pattern of the SS-OCT, Triton, with the most deviating B-scan spacing and a different imaging area.Nevertheless, we conclude that SRF volume is the fluid compartment with the highest agreement between different OCT manufacturers, which is reflected in the 95% limits of agreement from the BlandeAltman plots in the central 1 and the 6 mm after one single outlier correction (Table 3).Pigment epithelial detachment presence or PED volume generally does not influence treatment decisions in clinical practice.However, PED volume impacts visual acuity 37 and differed significantly between OCTs from various manufacturers in this analysis.Analyses of the BlandeAltman plots demonstrated that the highest volume differences were found in PED measurements in the central 1 and 6 mm (Table 3).Differences in the identification of BM, which might lead to overcorrection or undercorrection of PEDs in different OCT devices, lead to these volume differences, whereas the recognition of BM is easier with less background noise.Additionally, as PEDs were marked in almost each B-scan according to our annotation protocol, minimal deviations of the corrected anatomical region might result in differences in the calculated PED volume.
To date, the only established quantitative biomarkers on OCT are CRT and central subfield thickness, which show weak correlations with visual acuity in nAMD. 38,39ubstantial differences in CRT and central subfield thickness in different OCT devices have been reported by several groups. 40,41Furthermore, the highest variability in retinal thickness occurs in areas most affected by macular edema. 40The boundaries of retinal layers in the automated CRT software differ between Spectralis, Cirrus, and Topcon OCT devices and measure significantly different central subfield thickness values in nAMD. 41Thus, automated CRT measurements from device-dependent software cannot be used interchangeably between different OCTs without manual readjustments. 42,43ur findings are of high relevance not only for multicenter clinical trials with imaging protocols on different OCT devices but also for treatment routine in nAMD.Several research groups are developing and implementing AI-based fluid quantification on different OCT devices in data sets from nAMD patients from clinical settings and trials. 25,30,37Additionally, personalized treatment with AIbased fluid quantification is underway in clinical practice as a decision support. 12,44Early findings suggest that fluid volume is a precise and objective biomarker that allows for individual treatment monitoring of nAMD activity with lower levels of fluid volume being associated with superior visual outcomes. 34In the real world, varying acquisition protocols, background noise, and gray scales of fluid are challenges in the unification of AI algorithm performance. 45Fluid volume quantification in the clinic can only be based on automated algorithms, because manual IRF, SRF, and PED delineation in OCT B-scans is not applicable to busy clinical practice.Currently, most AI algorithms for automated fluid segmentation are trained based on manually annotated human expert reader data sets.Therefore, it is of utmost importance to analyze and define the fluid volume differences between commonly used OCT machines based on human expertise.Based on our results and previous work from the literature, we postulate that fluid quantification is dependent on proper fovea centration, B-scan spacing, and signal-to-noise ratio of the respective device used.Lower IRF fluid volume in the central 6 mm is measured in devices with lower signal-tonoise ratio compared with higher signal-to-noise ratio.Because IRF has been proven to be a fluid compartment with the highest impact on anatomical and functional outcomes, 34 clinicians should consider these device-dependent changes in IRF volume.
Strengths of this analysis include the large study cohort imaged with multiple devices in the same day visit and optimal expert reader annotations of all compartments.Moreover, we present the first quantitative comparison of nAMD fluid biomarkers in different OCTs based on trained reader expertise in the 4 most frequently used OCT devices.
Artificial intelligence combined with manual annotations by human graders provides the most robust evidence.
This study has limitations that should be considered when interpreting the data.First, analyses of the limits of agreement in the BlandeAltman plots in a cohort of 40 patients may lead to misinterpretation, as limits of agreement are calculated based on the SD of the differences between 2 devices.Therefore, outliers have a great impact on the calculated limits of agreement.Removing these outliers from the measurements would not mirror the reality in clinic.A post hoc analysis of the limits of agreement without this one particular outlier are summarized in the legend of Table 3.These results are closer to the real differences in SRF volume in the central 6 mm, in our opinion.
Second, the spacing of every second (Maestro, Cirrus) and every fourth (Triton) B-scan from the top to the bottom of the macular cube was chosen to compare the volume in the same number of B-scans in these 3 devices because the scanning density is 2 times higher in Triton than in Maestro and Cirrus.Consequently, minimal deviations in the position of the macular cube have an impact on the anatomical region that is analyzed in one single B-scan.Lack of an interdevice follow-up function means that the position of the macula cube might deviate minimally between the devices.However, previous studies demonstrated that no statistically significant differences in fluid volume are observed between 128 and 64 B-scans, and a minimum of 16 B-scans is sufficient to generate comparable volume maps. 32Considering that pixel-wise manual corrections of one OCT volume require between 3 to 8 hours for experienced readers, annotating each B-scan with 256 B-scans (Triton) and 128 B-scans (Maestro, Cirrus) would decrease the feasibility of these analyses without adding any pivotal value.The impact of B-scan density in this study cohort was further investigated in our subanalysis in Spectralis.No significant differences were found between Spectralis 64 B-scans and Spectralis 97 B-scans.However, with the prospect of applying these results to clinical practice, the standardized B-scan spacing in this study needs to be considered.With AI implementation to clinic on each device, clinicians should be aware of the fact that narrower B-scan spacing could influence fluid volume calculations, as AI-based fluid segmentation in clinical practice is not standardized and performed on each available B-scan. 23The third limitation is that although manual grading was performed with certified human expertise, a subjective aspect is inevitable in difficult cases.Such subjectivity would be reduced by reliable automated tools for each OCT device.
In conclusion, although fluid volume quantification is reliable in all 4 OCT devices, switching OCT devices might lead to different fluid volume measurements.However, there may be higher agreement in the central 1 mm compared with the central 6 mm, as summarized for each retinal fluid compartment.Understanding device-dependent fluid volume differences is essential for expanding the implementation and interpretation of AI-based fluid quantification in clinical trials and practice.

Figure 1 .
Figure 1.Pixel-wise expert reader intraretinal fluid (IRF), subretinal fluid (SRF), and pigment epithelial detachment annotations.A, Inner border of SRF easily distinguishable in all 4 OCT scans.B, Pixel-wise annotations with very similar labels in all 4 devices.C, Inner borders of SRF are hard to distinguish on Cirrus and Maestro but clearly delineated in Spectralis.D, Pixel-wise annotations are similar on all 4 devices despite qualitative differences.E, The IRF borders are delineated in all 4 OCT devices.F, Pixel-wise annotations with very similar labels in all 4 devices.G, The IRF borders are delineated in Spectralis but hard to distinguish in Cirrus, Maestro, and Triton.H, Pixel-wise annotations show qualitative differences.

Table 1 .
Image Settings and Technical Differences in All 4 OCT Devices y Swept-source OCT.